Skip to content

Conversation

@Amazing-Stardom
Copy link
Contributor

Description

Fixes # (issue)

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Style/UI improvements
  • Code refactoring
  • Performance improvements
  • Test additions/updates

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

  • Test A
  • Test B

Screenshots (if applicable)

Add screenshots to help explain your changes.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Additional Notes

Add any other context about the pull request here.

@livereviewbot
Copy link

Implement XXHash Caching and Distributed Locks

Overview

This change introduces a persistent XXHash caching mechanism for local files. It implements a new distributed locking system for B2 storage. Core rclone operations and configuration initialization are significantly refactored.

Technical Highlights

  • b2-manager/core/hash.go: Implements XXHash algorithm and a persistent hash.json cache for local file integrity checks.
  • b2-manager/core/handleFiles.go: Introduces a distributed locking mechanism using remote .lock files on B2 storage.
  • b2-manager/core/rclone.go: Refactors rclone operations into generic RcloneCopy, LsfRclone, and RcloneDeleteFile helpers.
  • b2-manager/config/config.go: Modularizes configuration initialization and establishes a new .b2m local data storage directory.
  • b2-manager/core/status.go: Changes database status data fetching from parallel to sequential execution, leveraging LsfRclone.
  • b2-manager/core/upload.go: Adds a pre-upload safety check to prevent accidental overwrites of newer remote databases.

Impact

  • Functionality:
    • Enables manual database locking for maintenance and provides concurrent update warnings during downloads.
    • Accelerates local file integrity checks through XXHash and a persistent hash cache.
    • Improves upload safety by preventing overwrites of newer remote database versions.
    • Streamlines remote operations with generalized rclone helpers and optimized status fetching.
  • Risk:
    • Lock acquisition is not atomic, potentially leading to race conditions for concurrent operations.
    • Switching from SHA256 to XXHash reduces cryptographic strength for integrity checks.
    • status.go removed explicit error handling for errLocal, errRemote, etc., which needs verification.
    • force unlock requires robust dbName validation to prevent unintended remote file deletions.

SyncStatusDifferent = "*"
)

// InitializeConfig sets up global configuration variables

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed sync status constants. Clarify if these constants are replaced or if the sync status logic has been removed entirely.

}

// Load config from b2m.toml
// Load config from b2m.toml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is duplicated.

// Fetch user details
fetchUserDetails()

AppConfig.LocalDBDir = filepath.Join(AppConfig.ProjectRoot, "db", "all_dbs")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path db/all_dbs is hardcoded. Consider making all_dbs a configurable constant if it might change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will use in toml file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Failed to generate AI response

This issue has been logged and will be investigated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good plan. Moving all_dbs to the toml file will make it configurable and addresses the concern about hardcoding.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Using the TOML file for all_dbs will effectively address the hardcoding concern. Please ensure this change is implemented to make the path configurable.

fetchUserDetails()

AppConfig.LocalDBDir = filepath.Join(AppConfig.ProjectRoot, "db", "all_dbs")
AppConfig.LocalVersionDir = filepath.Join(AppConfig.LocalDBDir, ".b2m", "version")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new .b2m directory is introduced. Ensure this change is communicated to users regarding local file structure.

return nil
}

func findProjectRoot() (string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The findProjectRoot function now searches for db or go.mod. This is a robust approach, but go.mod might not always indicate the project root for non-Go specific parts.

}
parent := filepath.Dir(dir)
if parent == dir {
return "", fmt.Errorf("root not found (searched for 'db' dir or 'go.mod')")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message root not found is generic. Specify which of the two conditions (db dir or go.mod) failed.

LogError("DownloadDatabase rclone copy failed for %s: %v", dbName, err)
return fmt.Errorf("download of %s failed: %w", dbName, err)
}
description := "Downloading " + dbName

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description parameter for RcloneCopy is hardcoded. Consider making it more dynamic or passing it from the caller for better context in logs.

// 3.1. Calculate Local Hash of the newly downloaded file
localDBPath := filepath.Join(config.AppConfig.LocalDBDir, dbName)
localHash, err := CalculateSHA256(localDBPath)
localHash, err := CalculateXXHash(localDBPath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from CalculateSHA256 to CalculateXXHash. XXHash is faster but not cryptographically secure. Confirm if this change aligns with the security requirements for integrity checks.

)

// LockDatabase creates a .lock file
func LockDatabase(ctx context.Context, dbName, owner, host, intent string, force bool) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The force parameter for LockDatabase allows overriding existing locks. This could lead to data corruption if not used carefully.

filename := fmt.Sprintf("%s.%s.%s.lock", dbName, owner, host)

// If forcing, we first clean up ALL existing locks for this DB to ensure we start fresh.
if force {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force is true, all existing locks for the specific database are cleaned up. This scope (dbName.*.lock) is crucial and prevents accidental deletion of other database locks.

}
defer os.Remove(tmpFile)

// Use RcloneCopy to upload the lock file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FetchLocks followed by RcloneCopy is not an atomic operation. A race condition could occur where another client acquires a lock between the FetchLocks call and the RcloneCopy operation.

}

// UnlockDatabase removes the .lock file
func UnlockDatabase(ctx context.Context, dbName, owner string, force bool) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The force parameter for UnlockDatabase allows deleting all locks for a database. This is powerful and potentially dangerous if misused.

if force {
// Use rclone delete with include pattern
// Pattern: dbName.*.lock
pattern := fmt.Sprintf("%s.*.lock", dbName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern for rclone delete uses dbName.*.lock. If dbName itself contains wildcards, this could lead to unintended deletions.


// cachedHash stores the hash and file stat info to avoid re-hashing unchanged files
type cachedHash struct {
Hash string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using time.Time directly for ModTime instead of int64 (UnixNano). This improves type safety and readability.

}

var (
fileHashCache = make(map[string]cachedHash)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global variables for cache and mutex can lead to tight coupling and make testing difficult.

func CalculateXXHash(filePath string) (string, error) {
info, err := os.Stat(filePath)
if err != nil {
LogError("CalculateXXHash: Failed to stat file %s: %v", filePath, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.


if ok && cached.ModTime == info.ModTime().UnixNano() && cached.Size == info.Size() {
return cached.Hash, nil
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else keyword is unnecessary here. The if block returns, so the following code is implicitly the else branch.

if ok && cached.ModTime == info.ModTime().UnixNano() && cached.Size == info.Size() {
return cached.Hash, nil
} else {
LogInfo("Cache miss for %s. Cached: %v, Current: ModTime=%d, Size=%d", filepath.Base(filePath), ok, info.ModTime().UnixNano(), info.Size())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging filepath.Base(filePath) might be insufficient for debugging if multiple files with the same base name exist in different directories.

// Calculate hash
f, err := os.Open(filePath)
if err != nil {
LogError("CalculateXXHash: Failed to open file %s: %v", filePath, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.

// Use streaming digest
h := xxh3.New()
if _, err := io.Copy(h, f); err != nil {
LogError("CalculateXXHash: io.Copy failed for %s: %v", filePath, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.

}

// Sum64 returns uint64, format as hex string for compatibility
hash := fmt.Sprintf("%016x", h.Sum64())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash is formatted as a 16-character hexadecimal string. Ensure this format is consistent with how hashes are stored and compared elsewhere in the system.


data, err := os.ReadFile(cachePath)
if err != nil {
LogError("LoadHashCache: Failed to read cache file: %v", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.

fileHashCacheMu.Lock()
defer fileHashCacheMu.Unlock()

if err := json.Unmarshal(data, &fileHashCache); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.


if err := json.Unmarshal(data, &fileHashCache); err != nil {
LogError("LoadHashCache: Failed to unmarshal cache: %v", err)
return fmt.Errorf("failed to unmarshal cache: %w", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a wrapped error fmt.Errorf("failed to unmarshal cache: %w", err) is good practice.


// Ensure directory exists
if err := os.MkdirAll(config.AppConfig.LocalAnchorDir, 0755); err != nil {
LogError("SaveHashCache: Failed to create directory: %v", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.

}

fileHashCacheMu.RLock()
data, err := json.MarshalIndent(fileHashCache, "", " ")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using json.MarshalIndent for cache files can lead to larger file sizes and slower read/write times for very large caches.

fileHashCacheMu.RUnlock()

if err != nil {
LogError("SaveHashCache: Failed to marshal cache: %v", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.

}

if err := os.WriteFile(cachePath, data, 0644); err != nil {
LogError("SaveHashCache: Failed to write file: %v", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogError call duplicates the error return. The caller will also receive the error.

logPath := filepath.Join(appConfigDir, "b2m.log")
// User requested logs in current directory
// logPath := "b2m.log"
logPath := "b2m.log"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing the log file directly in the current working directory might cause issues if the application is run from a read-only location or if multiple instances are run from different directories.

if err := cmd.Run(); err != nil {
LogError("DownloadAndLoadMetadata: rclone sync failed: %v", err)
// Use RcloneSync helper
if err := RcloneSync(config.AppConfig.VersionDir, config.AppConfig.LocalVersionDir); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RcloneSync helper should handle the 'remote metadata directory not found' case gracefully, possibly returning nil or an empty map if the remote is truly empty.

return result, nil
}

// FetchSingleRemoteMetadata downloads and parses the metadata for a specific DB from the remote version dir

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new function FetchSingleRemoteMetadata is a good addition for targeted metadata retrieval, avoiding full syncs.

func FetchSingleRemoteMetadata(ctx context.Context, dbName string) (*model.Metadata, error) {
// Get file ID (db name without .db extension). Use simple string trimming;
// if dbName doesn't end in .db, fileID will equal dbName.
fileID := strings.TrimSuffix(dbName, ".db")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TrimSuffix logic is fragile. If dbName is mydb.db.backup, fileID becomes mydb.db.backup. Consider a more robust way to derive fileID if it needs to be a specific identifier.

}

// Use RcloneCopy (it takes a file source and a directory destination usually, but let's check RcloneCopy impl)
// RcloneCopy signature: func RcloneCopy(ctx context.Context, src, dst, description string, quiet bool, onProgress func(model.RcloneProgress)) error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about RcloneCopy signature is helpful but should be removed or updated once the RcloneCopy implementation is finalized.

// "Copy the source to the destination. Doesn't transfer unchanged files."

// Let's use RcloneCopy with quiet=true.
if err := RcloneCopy(ctx, "copy", remotePath, localDir, "Fetching metadata", true, nil); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message failed to fetch remote metadata is generic. Include dbName for better context.

data, err := os.ReadFile(localFile)
if err != nil {
if os.IsNotExist(err) {
return nil, nil // Metadata doesn't exist remotely yet (New DB)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning nil, nil if metadata doesn't exist remotely is a valid strategy for new DBs, but ensure callers handle this nil metadata case correctly.

continue
}

// Update local anchor (local-version) to match the new metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical addition. After uploading new metadata, the local anchor (local-version) must be updated to reflect the new state.

}

// Update local anchor (local-version) to match the new metadata
// This ensures we don't show "DB Outdated" immediately after generation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment explains the 'why' for updating the local anchor, which is good.


// Update local anchor (local-version) to match the new metadata
// This ensures we don't show "DB Outdated" immediately after generation
if err := UpdateLocalVersion(dbName, *meta); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling for UpdateLocalVersion is marked as 'Non-critical but annoying'. Consider if a failure to update the local anchor should halt the batch process or be logged more prominently.

rcloneArgs := []string{"copy",
localPath,
config.AppConfig.RootBucket,
// RcloneCopy copies source to destination using rclone copy/copyto with options

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new RcloneCopy function generalizes rclone copy operations, improving reusability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants